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A classical problem for natural language processing systems is the 
high redundancy, in terms of dictionary entries, of the set of words 
encountered in raw text. According to highly productive rules, for 
example, all but a handful of English nouns have plural forms differing 
(in spelling) from the singular only in the presence of a final (e)s ; 
third person singular verb forms differ from the infinitive in the same 
way; most adjectives have regular comparative and superlative forms. 
Somewhat less productive rules generate various affixational derivatives 
of lexical stems, such as happiness, modernize, continental. A dictionary 
which contained separate entries for all these forms would clearly be 
highly inefficient. Some sort of morphological analysis must be employed 
by any system general enough to be useful. 

In the literature of computational linguistics, three general ap- 
proaches to the morphological analysis problem have developed. In pro- 
jects directed toward content analysis [1,2, 3, 4] affixes (generally 
suffixes) have been removed and in effect discarded, leaving only stems 
for consideration. Where the orientation was syntactic [5,6,7] affixes 
have been treated as heuristic clues for automatic part-of-speech class- 
ification, in an attempt to avoid dictionary look-qp altogether. The 
most sophisticated and ambitious attempts at a generalized natural lan- 
guage processing capability [8,9,10] have recognized the ultimate ne- 
cessity of considering stems and affixes combinatorially as syntactically 
and semantically functioning elements, and have programmed dictionary 
look-ups which assign syntactic (and semantic) codes to stems and modify 
the syntactic code according to the final suffix, if any. 
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Even Che latter efforts, however, produced as output encodings of 
words. A generative grammar as generally conceived has morphemes for 
terminal elements, and a syntactic analysis procedure based on such a 
grammar has as its first task the decomposition of an input string into 
its component morphemes. The present procedure is one approach to the 
performance of that task. It is designed to reduce morphologically 
complex English words to stems in canonical or dictionary form, plus 
affixes, inflectional and derivational, represented as morphemes or as 
syntactic features of the stem. Each morpheme of the input string 
should have a distinct representation for purposes of further analysis; 
consequently the task of the procedure includes analyzing as many 
nested levels of affixation as a word may contain. In this respect 
also it differs from the procedures described in references 1-10. 

The overall strategy is as follows. A set of analysis rules (AN) 
decompose words of the input string into presumable stems and pre- 
sumable affixes.^* These rules are based on characteristic spelling of 
the affixes. The analysis rules also restore the stem to its diction- 
ary form if its spelling was deformed by affixation (for example, 

2 

skating would be analyzed as skate+(ING)) . The presumable stem is then 



The linguistic nature of English affixation apparently makes it possible 
to analyze as many nested levels of affixation as a word may contain in a 
single pass through the analysis rules, if provision is made for internal 
looping in the case of certain suffixes. See [11] for discussion. 

2 

An alternative to restoring analyzed stems to their canonical forms 
would be to use a lexicon of deformed stems, with lexical look-up 
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looked up in the lexicon, and all of its lexical entries are associated 
with it. Thus skate might be categorised as a count noun and an intran- 
sitive verb. Next a set of mor pheme -combinatorial rules (MC) apply. The 
MC relate the set of lexical entries assigned to the stem to the pre- 
sumable affixes discovered by the AN, and modify the lexical entries 
accordingly . So if the input word had been skates , analyzed as skatef(S) , 
one HC rule would modify the noun entry for skate to indicate plurality, 
and another would modify the verb entry to (third person) singular. 
Finally a set of redundancy rules (RD) fill in additional syntactic in- 
formation not specified by the lexical entry or the MC. Thus if the 
input word was skate , one RD rule would indicate that the noun entry 
should be marked singular, and another would mark the verb as plural. 

The output of the RD is a fully categorized string of morphemes which 
serve as input to the syntactic analysis procedure. 

In the ideal case application of the AN would yield exactly the 
right morphological analysis of every input string— that is, the pre- 
sumable stems and affixes always would be the actual stems and affixes. 



sometimes operating on the basis of partial rather than complete match 
with the input word. Thus instead of restoring the stem-final e in the 
analysis of ska tins, the AN would yield skat+(ING) and the matching lex- 
ical entry would be skat . Skate as an input word would then also match 
skat . This is essentially the procedure followed in [9]. Such an 
approach is especially attractive in the analysis of languages whose 
canonical forms are bimorphemic, e.g., Russian, Spanish, etc. It is not 
without its problems, however. For example, how are wag and wage, fin 
and fine , mat and mate , etc • , kept distinct? 
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In practice, of course, the ideal is unattainable because of the pro- 
blem of homography. Spurious analyses will result when a word has a 
spelling characteristic of a class of derivatives without being a member 
of that class, e.g., herring * The AN can be designed to avoid making a 
certain number of wrong analyses by specification of minimal stem length 
and inadmissible stem spelling. Thus if the AN for -ing only applies 
when at least two letters precede the final -ing. analysis of sing, wing, 
etc., is avoided, and removal of the final *£ from ^us^, mess , 
buss , etc. is avoided by not allowing the AN for -s, to apply when the 
penultimate letter is also s. However, a large number of spurious 
analyses cannot be avoided by such qualifications. 

There are two mechanisms within the present procedure for rejecting 
spurious analyses. The first and simpler rejects an analysis when the 
presumable stem is not found in the lexicon. Thus if one of the AN 
detects the final -lv characteristic of adverbs derived from adjectives, 
some of the words to which it will apply are philately, contumely, h om i l y , 
and family . However, their presumable stems— *phl late, *contume, *homy , 
and *famy (the y for i substitution by the rule which gives happy, as the 
stem of happily) —are not English words and will not be found in the 
lexicon. The presumable analyses will therefore be rejected. 

It can also happen that a spurious presumable analysis yields a stem 
which is in the lexicon. We may take witness as an example. The analysis 
rule which removes the -ness characteristic of nominalized adjectives (e.g., 
rudeness) will analyze witness as wltf(NESS) . Wit is a valid English stem. 




a noun. After it has been so categorized by lexical look-up, the resulting 
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N + NESS goes to the MC. There is an MC rule for AIXJ + NESS, but none 
for N + NESS. Since no MC rule applies, the analysis is rejected as 

spurious . 

Each time a presumable analysis is made, the configuration immedi- 
ately prior to the analysis is saved as an alternative presumable 
analysis. The result of application of all the AN is thus a set of pre 
sumable analyses, some of which will be rejected. 

Implementation 

The procedure, designated MORPH, has been Implemented on the IBM 
System 360 In TREET, a list-processing system t 12] . It has been designed 
to serve as part of an experimental text-processing system, in which it 
provides the input to a transformational syntactic analysis procedure. 
Since the grammar on which this procedure is based deals with grammatical 
structure below the word level entirely in terms of syntactic features, 
the present version of MORPH represents morphemlcally complex words as 
stems with associated feature-value pairs, which may Include inherent 
features of the stem as well as features corresponding to the various 
discovered affixes. A fairly trivial modification of MORPH would be re- 

qulred to represent words as strings of morphemes. 

The operation of the analysis rules can be indicated in detail by a 
description of the syntax of a rule. Each AN rule is actually a state- 
ment in a variant of the string-processing language METEOR, for which an 
interpreter has been written in TREET, and the set of rules comprises a 
METEOR program. METEOR has been adapted for this use in MORPH by 
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eliminating those capabilities not needed for the task at hand, and by 
including some new features which prove useful. Thus while the reader 
may profit by referring to the detailed description of the METEOR syntax 
[13], differences will be apparent in the discussion of the analysis 
rule format, which follows. 

(INGS $SAVE $REV ($0 (S) G N I $2) (1 2 (ING) 6) EINSDF END) 

Figure 1 

Sample Analysis Rule 

An analysis rule (Figure 1) consists of seven fields, only two of 
which are obligatory. The first field is an optional name for the rule, 
by which it can be referenced in another rule. The second is an optional 
occurrence of the symbol $SAVE (this symbol cannot be used as a name) 
which, if present, causes the program to save the partial results of the 
analysis prior to the application of the rule. Saving does not occur if 
the rule does not apply. The third field is an optional occurrence of 
another special symbol, $REV, which, if present, specifies reversal of the 
elements to be analyzed, before application of the rule. This results in 
a right-to-left analysis. The fourth and fifth fields are lists used for 
the actual specification of the rule. These are required, and will be 
described shortly. The sixth field is an optional name of some other 
rule (i.e., a symbol in the first field of another rule) which will be 
tried next if the present rule applies. If the symbol END is in this 
field, no more analysis rules will be tried if the present rule success- 
fully applies. The seventh field, also optional, cannot be present unless 
the sixth field is included. It specifies a rule to be branched to if 
the present rule fails. END may be used in this field also. In the 
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absence of direction from these last two fields, control passes to the 
next rule in sequence, if any. 

The rule specification fields describe a "before" and "after" 
situation, respectively. Thus the second specifies operations to be 
performed if the first field finds a match. Elements of these fields 
are called constituents , and the two fields are called the left-hal f and 
the right-half of the rule. These fields are processed from left to 

right. 

Possible left-half constituents include: 

a) an element (a letter or a list not specifically described below) 
This must match identically, as a single constituent. A letter matches 
a letter of the input word. A list matches a previously analyzed affix, 
j b) a list of elements, headed by the symbol $0R — One element must 

match. 

c) a list of elements, headed by the symbol $NOT — Any element not 
in the list will match. 

d) a symbol of the form $n, where n is a digit — Any consecutive n 
elements will match. This is used to require at least n elements before 
a suffix, for instance, n may be 0; $0 constrains the match to start at 

the left boundary of a word. 

e) the symbol $$ — This matches the right terminal boundary of a 
word. ("Right", and "left" in the above discussion of $0, is defined 
after the effect of an occurrence of $REV. Due to the availability of 

the $REV option, $$ is seldom needed.) 

f) the symbol $ — This matches any number of arbitrary elements. 

) g ) a digit n — This matches whatever the nth constituent matched. 

ERIC 
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Obviously it must be at least the n+lst constituent. 

h) a list whose first member is $FN — This is a provision to allow 
more complex restrictions than $0R or $NOT. See [13] for details. 

Right-half constituents specify the new appearance of matching 
8 true tur e . They inc lude 

a) letters or lists, which are inserted as constituents. 

b) digits n, which specify the retention (and new location) of the 
nth left-half constituents. The absence of a digit corresponding to some 
left-half constituent indicates the deletion of that constituent. 

((V) NIL ((D .)) (V (TNS PST)) (AM) ) 

Figure 2 

Sample Morpheme-combinatorial Rule 

A morpheme-combinatorial rule (Figure 2) begins with a categoriza- 
tion, which presently is restricted to a category label and zero or one 
feature-value pairs indicating subcategorization (e.g., a rule may apply 
only to transitive verbs) . The second and third fields are lists of pre- 
fixes and suffixes, respectively. One of these fields must be NIL; i.e., 
a rule cannot refer to both prefixes and suffixes. These three fields 
specify the stem categorization and affix structure to which the rule 
applies. The non-NIL affix list has the following form: The list begins 

with the affix nearest the stem. Each affix is represented by a list 
(e.g. "(D)" for -ed), and each list may have an optional second member of 
'»*•* or . The M *" specifies a match with a flagged (i.e., previously 
matched-see discussion below) affix only; the specifies a match with 
an unflagged affix only. Omission of this second member allows the indi- 
cated affix to match in either event. The remaining elements of the MC 
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rule form the fourth field, a list of categorisations, possibly with sub- 
categorizations (feature-value pairs) , to be assigned to a successfully 
matched stem-affix combination. 

During the course of applying the MC, intermediate categorizations 
of the stem in combination with some of its affixes are obtained. Each 
of these partial results is marked with the number of affixes thus far 
accounted for . Each successful application of an MC rule produces inter- 
mediate categorizations accounting for exactly one more affix. For each 
tentative decomposition of the input word produced by the analysis rules, 
the process of applying the MC is basically a cycle through intermediate 
categorizations . 

Given a decomposition whose stem appears in the lexicon, the initial 
steps are to count the number of affixes in it (call this N) and to set 
the intermediate categorizations to be the lexical categorizations of the 
stem, each with an indication that no affixes have been accounted for . 

Once this is done, MORPH begins considering the intermediate categoriza- 
tions. 

For each intermediate categorization, a check is first made to see 
if the number of affixes accounted for is equal to N. If this is so, the 
categorization becomes input to the next set of rules, the redundancy 
rules. For example, in the case of the trivial decomposition where N - 0, 
the lexical categorizations of the simple stem are the end result of the 
(vacuous) application of the MC. 

If affixes remain to be accounted for, the MC rules applying to the 
category label of this particular intermediate categorization are obtained. 
Each such rule is checked, in order, to see if a match can be found. The 




feature-value pair, if present in the rule, must be found among the 
feature-value pairs of the intermediate categorization. If this require- 
ment is met, attention turns to the affixes. The prefixes or suffixes 
specified by the rule must match an unbroken string in the decomposition, 
starting at the position adjacent to the stem. Flagging restrictions in 
the rule must be met by the match. Further affixes may be present in the 
decomposition; all such would be farther from the stem than all affixes 

participating in a match. 

If no MC rule for the category label produces a successful natch, 
the intermediate categorisation is discarded; i.e., the proposed analysis 
path contains an illegal stem-affix combination. If a successful match 
is made, the last affix of the decoisposition which participated in the 
match is flagged, if it had not been flagged previously by successful 
application of an MC rule to another intermediate categorization. Flag- 
ging remains in effect white all intermediate categorizations of a given 
decomposition are processed (thus flagging alone cannot serve to keep 
I-— -i, of how many affixes have been accounted for in a given inter- 

mediate categorization) . 

Successful application of an MC rule yields, from the rule's last 
field, new intermediate categorizations, each of which is marked as hav- 
ing accounted for one more affix than the old intermediate categorization. 
If a new categorization has the same category label as the old, feature- 
value pairs of the two are merged. The new intermediate categorizations 
are added to the top of the list of those to be considered. 

At the end of the application of the MC, zero or more of the decom- 
positions of the input word produced by the analysis rules will have one 
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or more categorizations assigned to them. MORPH retains the information 
as to which stem underlies each such categorization. If no categoriza- 
tions appear at this stage, it is as if the stem(s) were not found in the 
lexicon — MORPH cannot analyze the input word. Categorizations produced 
by the use of the MC are now. sent through the redundancy rules. 

((V) (TNS PRES) (NUM PL)) 

Figure 3 

Sample Redundancy Rule 

The role of the redundancy rules is to complete the categorization 
of a word by specifying the values of those of its features which have 
not already been specified in the lexicon or by the MC. Alternatively 
they may be thought of as filling in the morphemes with zero graphemic 
reflexes. The particular information supplied by the RD in the present 
implementation is in the form of feature-value pairs. Each RD rule 
(Figure 3) begins with a categorization indicating the lexical category 
to which it applies. As with MC rules, this categorization may include 
one subcategorizing feature-value pair. The remainder of the rule is a 
list of feature-value pairs. When a rule is found to be applicable to a 
categorization, these pairs are considered, and if none of their feature 
labels are present in the categorization of the word (i.e., no conflicts 
are found), the pairs are added to the categorization. The result of 
applying the RD is the set of final categorizations for an input word. 

We now give the present AN, MC, and RD, with comments on each set. 
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An example of the processing of an input word by MORPH using these sets 
of rules is then given. 
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Analysis rules: 

1) ($SAVE $REV ($0 S ($NOT I 0 S U» (1 (S) 3) DIGS) 

2) ($SAVE $BEV ($0 D E) (1 (D)) EIMSB) 

3) ($SAVE $REV ($0 G N D (1 (IMG)) EIMSB END) 

4) (IMGS $SAVE $REV ($0 (S) G M I) (12 (IMG)) EINSDF) 

5) ($REV ($0 (S) E ($0R I 0)) (1 2 4) Y END) 

6) (EIMSB $BEV ($0 ($0R (D) (ING))($0R L R) ($0R DTZ)) (1 2 E 3 4) END) 

7) ($REV ($0 ($0R (D)(IHG))($0R C U V Z)) (1 2 E 3) END) 

8) (EINSDP $REV (($0R (D) (IMG)) ($0R N T) ($0R A I)($N0T A)) (1 E 2 3 4) END) 

9) ($REV (($0R (D) (IMG)) ($0R L R) U ($M0T AO)) (1 E 2 3 4) END) 

10) (Y $REV ($0 ($0R (S) (D)) I) (1 2 Y) END) 

11) ($REV (($0R (D) (IMG)) ($0R L M N) 2) (12) END) 

Rule 1) detaches the suffix -s,, reducing nouns to the singular form Q 

and (third-person) verbs to the present plural or infinitive form. The 
restriction on the letter preceding the a eliminates many incorrect analy- 
ses, such as for analysis , grass , etc. At the same time some correct 

analyses are prevented, e.g. taxis. 

Rules 2) through 4) detach the suffixes -ed and -ing, rule 4) cover- 
ing the -ing + -s, case. These are the only affixes which this set of 
analysis rules processes; since all are suffixes, the rules all make use 
of $BEV, specifying right-to-left processing. Rules 5) through 11) 
attempt to restore stems to their correct form. For instance, rule 9) 
restores the e on the stem of such inputs as manufacturing, ruled, etc. 

The restriction prevents incorrectly adding the e to the stem of such 
words as hauling , pouring, etc. Only two known incorrect analyses are 




i 



-13- 



made, for auguring and murmuring . Rule 10) replaces 1 with y., and rule 
11) removes the second of doubled consonants, as in programming. 
Morpheme-combinatorial rules : 

1) ((V) NIL ((ING .)) (V (DIG PLUS)) (ADJ (ING PUIS)) (N (CMNF CMN) 

(ANIM MINUS) (DIG PLUS)) ) 

2) ((V) NIL CCS .)) (V (TNS PRES) (NUM SG)) ) 

3) ((V (TRANS MINUS)) NIL ((D .)) (V (TNS PST)) ) 

4) ((V) NIL CCD .)) (V (TNS PST)) (AW) ) 

5) ((N) NIL ((S)) (N (NUM PL)) ) 

6) ((N) NIL ((ING *)(S .)) (N (NUM PL)) ) 

For this set of rules, the legal combinations are: a) from rule 1), 

verb + -ing . yielding a verb, adjective or noun with appropriate feature- 
value pairs; b) verb + -s, with the information obtained from the suffix 
retained by encoding it in feature-value pairs; c) verb + -ed, except 
that for intransitive verbs the participle (adjective) form is not allowed, 
so two rules are needed; d) noun + -s, as expected, where rule 6) covers 
the - lng8 case, which involves a path through rule l)'s categorization of 
verb + -ing as a noun. The use of the denoting a match with only an 
unflagged affix in the first four rules assumes that the lexical categor- 
izations of a word have been ordered in the lexicon so that any one with 
the category label V is first. Since an entry in the lexicon might have 
categories V and/or N, rule 5) must allow a match with either a flagged 

or an unflagged affix. 
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Redundancy rules: 

((V) (TNS PRES) (NUM PL)) 
((N) (NUM SG)) 



3) ((N (HUM PLUS)) (ANIM PLUS)) 

4) ((N (ANIM MINUS)) (HUM MINUS)) 

These rules specify information as follows: If a verb has neither 

tense nor number specified, it is taken as present plural ; nouns not 
marked plural are assumed singular; appropriate interpretations are made 
for the features of humanity and animacy for nouns. 

Consider the input word holdings . MORPH first applies the analysis 
rules to the string of its letters , (HOLDINGS). Rule 1) produces 
(HOLDING (S)) , and $SAVE causes retention of the prior analysis 
(HOLDINGS). Control is passed to rule 4), INGS, which applies, detaching 
the second suffix to produce (HOLD (ING) (S)) , and saving (HOLDING (S)) 
as a possible analysis. Control passes to EINSDF, rule 8), and subse- 
quently to rules 9), 10), and 11), none of which apply (correctly, since 
HOLD is a correctly spelled stem) . Thus there are three tentative decom- 
positions: (HOLDINGS), (HOLDING (S)), and (HOLD (ING) (S)) . 

Neither holdings nor holding are lexical stems , so the only tentative 
decomposition of interest is (HOLD (ING) (S)) . A possible lexical entry 
for hold is (HOLD (V (TRANS PLUS)) (N (CMNF CMN)) ), which indicates that 
hold is a transitive verb or a common noun. (Note the ordering of the 
categorizations, with the V before the N.) Given these lexical cate- 
gorizations, the application of the MC begins with the intermediate 
categorizations (0 V (TRANS PLUS)) and (0 N (CMNF CMN)). The zeros indi- 
cate that no affixes have been accounted for. 

The first intermediate categorization is examined. Its category 
label is V, and the four MC rules pertaining to this label are considered. 
The first of these, rule 1), matches (the (ING) of the decomposition is 



unflagged, as required) . This causes flagging of the (ING) so that the 
tentative decomposition now appears as (HOLD (ING *) (S)) . The three 
categorizations of MC rule l)'s last field become Intermediate categor- 
izations with l's associated with them to Indicate that they have 
accounted for one affix. Thus the list of Intermediate categorizations 
Is now (1 V (ING PLUS)), (1 ADJ (ING PLUS)), (1 N (CMNF CMN) (ANIM MINUS) 
(ING PLUS)) and (0 N (CMNF CMN)). Each of these must be examined. The 
category label of the first Is V, but MC rule 1) does not match this time, 
since the (ING) Is now flagged. No other rules apply and the counter, 

1, does not equal N (N * 2, since there are two suffixes), so this path 
Is discarded (note that the flagging operation Is crucial here) . The 
second Intermediate categorization Is also discarded, since there are no 
MC rules applying to the category ADJ and Its counter Is also 1. Rejec- 
tion of these two paths corresponds to the fact that the progressive 
verb and participial adjective forms holding do not take an -s. 

The next Intermediate categorization has category label N, and there 
are two MC rules for this label. The first, rule 5) , does not apply 
because the (S) Is not adjacent to the stem In the decomposition. How- 
ever, the next applies, changing the decomposition to (HOLD (ING *) 

(S *)) and yielding the Intermediate categorization (2 N (NUM PL) 

(CMNF CMN) (ANIM MINUS) (ING PLUS)). The 2 Indicates that this categor- 
ization has accounted for both suffixes and Is to be sent to the RD. 

The remaining Intermediate categorization, (0 N (CMNF CMN)), 
triggers an attempt to apply rules 5) and 6) again. 5) does not match, 
as before, and now 6) does not either, due to the flag on the (S *) . 
Therefore, this categorization Is discarded, as it should be since nouns 
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do not take the suffix -ing . Note again the importance of the flagging 
operation. 

The RD rules associated with N, namely 2) , 3) , and 4) , are now 
applied to the categorization (N (NUM PL) (CMNF CMN) (ANIM MINUS) (ING PUIS)). 

Kule 2) has no effect since number has already been specified. Rule 3) 
does not apply since the feature-value pair restriction Is not met, but 
rule 4) applies, yielding as final output from MORPH the categorization 
( N (hum MINUS) (NUM PL) (CMHF CMN) (ANIM MINUS) (ING PUIS) ) . MORPH also 
reports that the stem associated with this analysis of holdin gs is hold- 

Discussion 

The present sets of rules are quite small, removing only the most 
coamon Inflectional suffixes. Using these rules on the small test 0 

vocabulary given it so far, MORPH has performed accurately at an average 
speed of about 0.7 seconds per word on the 360/40. The questions of 
accuracy and efficiency become more critical, of course, as more com- 
prehensive morphological analysis is attempted. In an earlier implemen- 
tation [14], a considerably larger set of AN was programmed in FORTRAN 
on the 7090, but without any corresponding MC or RD. This set of AN 
included rules for the suffixes -ljj., -ness, -or, -er , -est, -iblg, -able, 
and -less, and the prefixes un-, in-, and 222.-, as well as the rules for 
-s, -ed, and -ing. This program analyzed a vocabulary of 112 words in 
about 5 1/2 seconds, but with many spurious analyses, as would be 
expected (some Interesting examples: (ItH -cape+ (ABLE) incapable , 

(UN)+(LRSS) ’unless’; (IN)+teg**(ER) ’integer’; eith+(ER) ’either’). 

( 
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Inspection of the results of this end subsequent experiments with this 

set of rules disclosed no spurious snalyses which could not, in prin- 

3 

ciple, be rejected by use of the complete procedure as described. 

An important result of the experiments with this program was the 
realization, obvious in retrospect, that proper names require special 
treatment, perhaps a distinguishing diacritic upon input. Otherwise 
many spurious analyses result, e.g., Socrate+(S) , Grell+(ING) , Schill+(ER) . 
Another result was the discovery that there is far greater homography 
with English prefixes than with suffixes, to the extent that the value 
of prefixational analysis is questionable. 

3 



^There are, of course, cases of genuine orthographic ambiguity, words for 
which there are two or more valid morphological analyses, one of which 
will always be spurious in a given context. For example, number will 
always be analyzed as a comparative adjective as well as a noun. 
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